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We present two examples of finite-alphabet, infinite excess entropy processes generated by invari- 
ant hidden Markov models (HMMs) with countable state sets. The first, simpler example is not 
ergodic, but the second is. It appears these are the first constructions of processes of this type. 
Previous examples of infinite excess entropy processes over finite alphabets admit only invariant 
HMM presentations with uncountable state sets. 

Keywords: stationary stochastic process, hidden Markov model, epsilon-machine, ergodicity, en- 
tropy rate, excess entropy, mutual information 

PACS numbers: 02.50.-r 89.70.+C 05.45.Tp 02.50.Ey 



For a stationary process (X t ) the excess entropy E is the mutual information between the infinite past X = 

. . . X—%X-i and the infinite future X = XqXi .... It has a long history and is widely employed as a measure of 
correlation and complexity in a variety of fields, from ergodic theory and dynamical systems to neuroscience and 
linguistics 1 (i ; see Ref. [7] and references therein for a review. 

An important question in classifying a given process is whether it is finitary (finite excess entropy) or infinitary 
(infinite excess entropy). Over a finite alphabet, many of the simple process classes commonly studied are always 
finitary. These include all i.i.d. processes, Markov chains, and processes with finite-state hidden Markov model 
(HMM) presentations. There also exist several well known examples of finite-alphabet infinitary processes, though. 
For instance, the symbolic dynamics at the onset of chaos in the logistic map and similar dynamical systems [7] and 
the stationary representation of the binary Fibonacci sequence [8] are both infinitary. 

These latter processes, however, only admit invariant HMM presentations [*] with uncountable state sets. Indeed, 
any process generated by an invariant countable-state HMM either has positive entropy rate or consists entirely of 
periodic sequences, which these do not; see App. [B] Versions of the Santa Fe Process introduced in Ref. [6] are 
finite-alphabet infinitary processes with positive entropy rate. However, they were not constructed directly as HMMs, 
and it seems unlikely that they should have any invariant countable-state presentations. To the best of our knowledge, 
to date there are no examples of finite-alphabet, infinitary processes with invariant countable-state presentations. 

We present two such examples. The first is nonergodic, and the information conveyed from the past to the future 
essentially consists of the ergodic component along a given realization. This example is straightforward to construct 
and, though previously unpublished, we suspect that others are aware of this or similar constructions. The second, 
ergodic example, though, is more involved and we believe that both its structure and properties are novel. 

To put these contributions in perspective, note that any stationary finite-alphabet process may be trivially repre- 
sented as an invariant HMM with an uncountable state set, in which each infinite history vE corresponds to a single 
state. Thus, it is clear invariant HMMs with uncountable state sets can generate finite- alphabet infinitary processes. 
In contrast, for any finite-state HMM E is always finite — bounded by the logarithm of the number of states. The 
case of countable-state HMMs lies in between the finite-state and uncountable-state cases, and it was previously not 
clear whether it is possible to have countable-state invariant HMMs that generate infinitary finite-alphabet processes 
and, in particular, ergodic ones. Here, we show that infinite excess entropy is indeed possible for processes with 
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[*] Invariant here means tnat tne state sequence of the underlying Markov chain and, hence, the output sequence generated by the HMM 
are both stationary processes |5j. 
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countable-state state generators, even when ergodicity is required. 



II. BACKGROUND 
A. Excess Entropy 



Definition 1. For a stationary, finite- alphabet process [X t )t£Z the excess entropy E is the mutual information between 
the infinite past X~ = ...X-2X-1 and the infinite future x" — X0X1...: 

F, = I[ < X;X'] = lim ifX 1 ;^} , (1) 

t— >oo 

where X* — X- t ---X-\ and ~?l t = X^...X t -i are the length-t past and future, respectively. 
In Refs. [3 [9] it is shown that E may also be expressed alternatively as: 

E = lim (H[x' t } - hA , (2) 

t— ¥00 \ / 

where is the process entropy rate: 

= lim = lim HIX^] . (3) 

t— foo t t— foo 

That is, the excess entropy E is the asymptotic amount of entropy (information) in length-t blocks of random variables 
beyond that explained by the entropy rate. The excess entropy derives its name from this formulation. We also use 
this formulation to establish that the process of Sec. |III A| is infinitary. 

Expanding the block entropy H[]l t ] in Eq. with the chain rule and recombining terms gives another important 
formulation: 

00 

E = J2(K(t)-K) > (4) 

t=i 

where h^it) is the length-t entropy-rate approximation: 

h M (t) = H [Xt^l 1 - 1 ] , (5) 

the conditional entropy in the t t h symbol given the previous t — 1 symbols. This final formulation will be used to 
establish that the process of Sec. |IIIB| is infinitary. 



B. Hidden Markov Models 



There are two primary types of hidden Markov models: edge-emitting (or Mealy) and state-emitting (or Moore). 
We work with the former edge-emitting type, but the two are equivalent in that any model of one type over a finite 
alphabet may converted to a model of the other type without changing the cardinality of the state set by more than 
a constant factor — the alphabet size. Thus, for our purposes, Mealy HMMs are sufficiently general. We also consider 
only invariant HMMs, as defined in [}J], over finite alphabets and with countable state sets. 

Definition 2. An invariant, edge-emitting, countable-state, finite-alphabet hidden Markov model (hereafter referred 
to simply as a countable-state HMMJ is a J^-tuple (S, X, {T x }, it) where: 

1. S is a countable set of states. 

2. X is a finite alphabet of output symbols. 

3. T^ x \x<E X, are symbol labeled transition matrices. T^J, is the probability that state it transitions to state a' on 
symbol x. 

4-. 7r is an invariant or stationary distribution for the underlying Markov chain over states with transition matrix 
T = ^2ixex T^ x \ That is, tt satisfies tt = ttT. 
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Remark. "Countable" in Property 1 means either finite or countably infinite. If the state set S is finite, we also 
refer to the HMM as finite-state. 

A hidden Markov model may be depicted as a directed graph with labeled edges. The vertices are the states a e S 
and, for all a, a 1 e S with T^J, > 0, there is a directed edge from state a to state a' labeled p\x for the symbol x and 

(x) 

transition probability p — T^J,. These probabilities are normalized so that the sum of probabilities on all outgoing 
edges from each state is 1. An example is given in Fig. [I] 




FIG. 1: A hidden Markov model (the e-machine) for the Even Process. The support for this process consists of all binary 
sequences in which blocks of uninterrupted Is are even in length, bounded by 0s. After each even length is reached, there is a 
probability p of breaking the block of Is by inserting a 0. The machine has two internal states S = {oi , 02 } , a two symbol alpha- 
bet X = {0, 1}, and a single parameter p £ (0, 1) that controls the transition probabilities. The associated Markov chain over 
states is finite-state and irreducible and, thus, has a unique stationary distribution it = (tti,^) = (1/(2 —p), (1 — p)/(2 — p)). 
The graphical representation of the machine is given on the left, with the corresponding transition matrices on the right. In the 
graphical representation the symbols labeling the transitions have been colored blue, for visual contrast, while the transition 
probabilities are black. 

The operation of a HMM may be thought of as a weighted random walk on the associated graph. That is, from the 
current state a the next state a' is determined by following an outgoing edge from a according to the edges' relative 
probabilities (or weights). During the transition, the HMM outputs the symbol x labeling this edge. 

The state sequence (St) determined in such a fashion is simply a Markov chain with transition matrix T. However, 
we are interested not simply in the state sequence of the HMM, but rather the associated sequence of output symbols 
(X t ) that are generated by reading the labels off the edges as they are followed. The interpretation is that an observer 
of the HMM may directly observe this sequence of output symbols, but not the hidden internal states. Alternatively, 
one may consider the Markov chain over edges (E t ), of which the observed symbol sequence (Xt) is simply a projection. 

In either case, the process (X t ) generated by the HMM (S, X, {T x }, ir) is defined as the output sequence of edge 
symbols, which results from running the Markov chain over states according to the stationary law with marginals 
P(S'o) = V(St) = 7T. It is easy to verify that this process is itself stationary, with word probabilities given by: 

P(w) = \\nT^\\ 1 , (6) 

where for a given word w — uii...w n G X + , is the word transition matrix — T*- 1 " 1 ) • • ■ T^ w "\ The process 
language is the set of words C — {w : F(w) > 0}. 

Remark. Even for a noninvariant HMM (S, X, {T x }, tt), where the state distribution tt is not stationary, one may 
always define a one-sided process (X t )t>o with marginals given by: 

P(J?H = w) = IkT^Hi . (7) 

Furthermore, though the state sequence (St)t>o will not be a stationary process if it is not a stationary distribution for 
T, the output sequence (X t )t>o fnay still be stationary. In fact, Ref. Example 2.9] showed that any one-sided process 
over a finite alphabet X , stationary or not, may be represented as a countable- state noninvariant HMM in which the 
states correspond to finite-length words in X + , of which there are only countably many. By stationarity , a one-sided 
stationary process generated by such a noninvariant HMM can be uniquely extended to a two-sided stationary process. 
So, in a sense, any two-sided stationary process (X t )t<£Z can be said to be generated by a noninvariant countable- state 
HMM. Though, this is a slightly unnatural interpretation of process generation in that the two-sided process (Xt)tez 
is not directly the process obtained by reading symbols off the edges of the HMM as it runs along transitioning between 
states in bi-infinite time. In either case, the space of stationary finite- alphabet processes generated by noninvariant 
countable-state HMMs is too large: it includes all stationary finite- alphabet processes. Due to this, we restrict to the 
case of invariant HMMs where both the state sequence (St) and output sequence (X t ) are stationary. Clearly, if one 
allows finite- alphabet processes generated by noninvariant countable- state HMMs there are infinitary examples. And 
so, in the following development HMM will implicitly mean invariant HMM, but this will no longer be stated. 



We consider now an important property known as unifilarity. This property is useful in that many quantities are 
analytically computable only for unifilar HMMs. In particular, for unifilar HMMs the entropy rate h^ is often directly 
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computable, unlike the nonunifilar case. Both of the examples constructed in Sec. Ill are unifilar, as is the Even 
Process HMM of Fig. [I] 

Definition 3. A HMM (S, X, {T x }, n) is unifilar if for each a € S and x € X there is at most one outgoing edge 
from state a labeled with symbol x in the associated graph G. 

It is well known that for any finite-state unifilar HMM the entropy rate in the output process (X t ) is simply the 
conditional entropy in the next symbol given the current state: 

h fl = H[X \S ] = J2^h cr , (8) 

<tG5 

where ~K a is the stationary probability of state a and h a — H[Xq\Sq = u] is the conditional entropy in the next symbol 
given that the current state is a. 

We are unaware, though, of any proof that this is generally true for countable-state HMMs. If the entropy in the 
stationary distribution H[ir] is finite, then a proof along the lines given in Ref. [TIT] carries through to the countable- 
state case and Eq. ([sj) still holds. However, countable-state HMMs may sometimes have H[w] — oo. Furthermore, it 
can be shown [9j that the excess entropy E is always bounded above by H[ir]. So, for the infinitary process of Sec. 



IIIB we need slightly more than unifilarity to establish the value of h„. To this end, we consider a property known 



as exactness 

Definition 4. A HMM is said to be exact if for a.e. infinite future X = x$x\... generated by the HMM an observer 
synchronizes to the internal state after a finite time. That is, for a.e. ~~£ there exists ( £ N such that H[S t 0t l = 
x 1 ] = 0, where it 1 — x^x\...Xt-i denotes the the first t symbols of a given x . 

In App. |X]we prove the following proposition. 

Proposition 1. For any countable- state, exact, unifilar HMM, the entropy rate is given by the standard formula of 
Eq. @. 

The HMM constructed in Sec. |III B| is both exact and unifilar, so Prop, [I] applies. Using this explicit formula for 
h^, we will show that E = X)t=i CvW — 'v) * s infinite- 



Ill. CONSTRUCTIONS 



We present two constructions of (invariant) countable-state HMMs that generate infinitary processes. In the first 
example the output process is not ergodic, but in the second it is. 



A. Heavy- Tailed Periodic Mixture: An infinitary nonergodic process with a countable-state presentation 

Figure [2] depicts a countable-state HMM M, for a nonergodic infinitary process V. The machine M consists of a 
countable collection of disjoint strongly connected subcomponents Mi, i > 2. For each i, the component Mi generates 
the periodic processes Vi consisting of i — 1 Is followed by a 0. The weighting over components is taken as a heavy- 
tailed distribution with infinite entropy. For this reason, we refer to the process M generates as the Heavy-Tailed 
Periodic Mixture (HPM) process. 

Intuitively, the information transmitted from the past to the future for the HPM Process is the ergodic component 
i along with the phase of the period- i process Vi in this component. This is more information than simply the ergodic 
component i, which is itself an infinite amount of information: H[(fi2, fJ-3, ■■■,)] = 00. Hence, E should be infinite. 
This intuition can be made precise using the ergodic decomposition theorem of Debowski |12j , but we present a more 
direct proof here. 

Proposition 2. The HPM Process has infinite excess entropy. 

Proof. For the HPM Process V we will show that (i) lim^oo H\^ l \ = 00 and (ii) = 0. The conclusion then follows 
immediately from Eq. To this end, we define sets: 

Wi t — { w ■ \w\ = t and w is in the support of process Vi}, 
U t = (J W it t , and 

2<i<t/2 

v t = U w itt . 

i>t/2 
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FIG. 2: A countable-state HMM for the Heavy- Tailed Periodic Mixture Process. The machine M is the union of the machines 
Mi,i > 2, generating the period-i processes of % — 1 Is followed by a 0. All topologically allowed transitions have probability 
1. So, for visual clarity these probabilities are omitted from the edge labels and only the symbols labeling the transitions are 
given. The stationary distribution n is chosen such that the combined probability fii of all states in the the i t h component 
is im = C/(i log 2 i), where G = 1/ (X)~a 1 /( ilo S 2i )) is a normalizing constant. Formally, the HMM M = (S, X , {T (x) } 
has alphabet X = {0, 1}, state set 5 = {ct^ : i > 2, 1 < j < i}, stationary distribution 7r defined by 7iy, = C/(i 2 log 2 i), and 
transition probabilities = 1 for i > 2 and 1 < j < i, T^ ^ = 1 for i > 2, and all other transitions probabilities 0. Note 

that all logs here (and throughout) are taken base 2, as is typical when using information-theoretic quantities. 

Note that any word w € W^t with i < t/2 contains at least two 0s. Therefore: 

1. No two distinct states ov, and ay/ with i < t/2 generate the same length t word. 

2. The sets Wij,i < t/2, are disjoint from both each other and V*. 

It follows that each word w £ Wij, with i < t/2, can only be generated from a single state Oij of the HMM and has 



probability: 



P(w) = P(^* = w) 




P(^' 



w\S = cry) 



= C/(z 2 log 2 i) 



(9) 



Hence, for any fixed t: 






so: 




(10) 
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which proves Claim (i). Now, to prove Claim (ii) consider the quantity: 

h ll (t+l) = H[X t \3 t ] 

= p M • H[Xt\3* = M + p H ' #[*tl^* = H • (11) 



wGUt wGVt 



On the one hand, for w G Ut, H[X t \x' t — w] — since the current state and, hence, entire future are completely 
determined by any word w G U t . On the other hand, for w G V t , H[X t \^ t = w) < 1 since the alphabet is binary. 
Moreover, the combined probability of all words in the set Vt is simply the probability of starting in some component 



Mi with i > t/2: V(V t ) = J2i>t/2 Mi- Thus, by Eq. ( 11 ), h^(t + 1) < J2i>t/2 Mi- Since J2i Mi converges, it follows that 
hfj,(t) \ 0, which verifies Claim (ii). 

□ 



B. Branching Copy Process: An infinitary ergodic process with a countable-state presentation 

Figure [3] depicts a countable-state HMM M for the ergodic, infinitary Branching Copy Process. Essentially, the 
machine M consists of a binary tree with loop backs and a self-loop on the root node. From the root node a path 
is chosen down the tree with each left-right (or 0-1) choice equally likely. But, at each step there is also a chance of 
turning back towards the root. The path back is a not a single step, however. It has length equal to the number of 
steps taken down the tree before returning back, and copies the path taken down symbol-wise with 0s replaced by 
2s and Is replaced by 3s. There is also a high self-loop probability at the root node on symbol 4, so some number 
of 4s will normally be generated after returning to the root node before preceding again down the tree. The process 
generated by this machine is referred to as the Branching Copy (BC) Process, because the branch taken down the 
tree is copied on the loop back to the root. 

By inspection we see that the machine is unifilar with synchronizing word w = 4, i.e. H[S±\Xq — 4] — 0. Since the 
underlying Markov chain over states (St) is positive recurrent, the state sequence (St) and symbol sequence (X t ) are 
both ergodic. Thus, a.e. infinite future x contains a 4, so the machine is exact. Therefore, Prop. [l]may be applied, 
and we know the entropy rate is given by the standard formula of Eq. |8]): = TT a h a . Since P(St = o~) = 7r CT 
for any t G N, we may alternatively represent this entropy rate as: 

hp - 




(12) 

where Ct = {w : \w\ = t, P(w) > 0} is the set of length t words in the process language £, <f>(w) is the conditional state 

distribution induced by the word w (i.e., 4>(w) <7 = V(S t = a\x' t = w)), and h w — J2 a 0(' u; )cr^CT is the ^(w)-weighted 
average entropy in the next symbol given knowledge of the current state a. 

Similarly, for any t G N the entropy-rate approximation h^(t + 1) may be expressed as: 

M* + 1) = H[X t \i l ] = Hw)h w , (13) 
where h w = H[Xt\^ 1 — w] — H[Xq\Sq ~ 4>(w)] is the entropy in the next symbol given the word w. Combining Eqs. 



( 12 ) and ( 13 ) we have for any t G N: 

K(t + 1) - / v = Yj p M( h u, - K) ■ (14) 



wec t 



By concavity of the entropy function, the quantity h w — h w is always nonnegative. Furthermore, in Claim [5] we show 
that h w — h w is always bounded below by some fixed positive constant for any word w consisting entirely of 2s and 
3s. Also, in Claim [3] we show that P(Wt) scales as 1/t, where Wt is the set of length-t words consisting entirely of 2s 
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FIG. 3: A countable-state HMM for the Branching Copy Process. The machine M is essentially a binary tree with loop- 
back paths from each node in the tree to the root node and a self-loop on the root. At each node a}j in the tree there is 
a probability 2g; of continuing down the tree and a probability p 4 = 1 — 2qi of turning back towards the root er^ on path 
Uj ~ a]j — > a'fj — > ajj... — > ah — > Oqi. If the choice is made to head back, the next i — 1 transitions are deterministic. The 
path of Os and Is taken to get from ctqi to ajj is copied on the return with Os replaced by 2s and Is replaced by 3s. Formally, 
the alphabet is X = {0, 1, 2, 3, 4} and the state set is 5 = {afj : i > 0, 1 < J < 2 l , 1 < < max{i, 1}}. The nonzero transition 
probabilities are as depicted graphically with pi = 1 — 2qi for all i > 0, q% = i 2 /[2(i + l) 2 ] for all i > 1, and qo > taken 
sufficiently small so that H[(po, qo, go)] < 1/300. The graph is strongly connected so the Markov chain over states is irreducible. 
Claim [T] shows that the Markov chain is also positive recurrent and, hence, has a unique stationary distribution ir. Claim [2] 
gives the form of n. 

and 3s. Combining these results it follows that h^(t + 1) — /i„ > 1/i and, hence, the sum E = J^uLi (fyuW — ^m) ' s 
infinite. 

A more detailed analysis with the claims and their proofs is given below. In this we will use the following notation: 
. P a (-) = P(.|5 = a), 

• Vt = {w G Ct '■ w contains only 0s and Is} and Wt = {w G Ct '■ w contains only 2s and 3s}, 

• irfj = P(c*j ) is the stationary probability of state afj, 

• Rij = W,.rr;,. ajj}, and 

• t« = Efc=l n ij and n i = E|=i 77 ij- 



Note that: 

P ctJi (A* 4 G V t ) = , for all t > 1 , (15) 

and: 

Pi = 7?^2 < ?, for aU % > 1. (16) 
These facts will be used in the proof of the Claim [l] 

Claim 1. The underlying Markov chain over states for the HMM is positive recurrent. 

Proof. Let T a \ = min{i > : St — Cgi} be the first return time to state cr^. Then, by continuity: 

P ff j i (r ff j i =cx>)= t limP o3i (r o3i >2t) 

= lim P„i e Vt+i) 

,. l-Po 

= lim — -r^r 

t^oo (t + l) 2 

= . 

Hence, the Markov chain is recurrent and we have: 

oo 

E i {r a i 1 = ^1 (r a i =t)-t 

"oi v °oi ' Z-^i a oi x °oi ' 
t=l 

oo 

= P 0-l + E P -a 1 1 ( r -a 1 1 = 2 *)- 2i 

t—1 

OO 



t=l 



<po+}2 — -- t - 2t 
t=i 



< oo 



from which it follows that the chain is also positive recurrent. Note that the topology of the chain implies the first 
return time may not be an odd integer greater than 1. □ 



Claim 2. The stationary distribution n has: 

i 2 -2 



4 = ,2 o, > 1<J<2\ (17) 



where C = (1 — p ). 

Proof. Existence of a unique stationary distribution it is guaranteed by Claimjl] Given this, clearly tt\ = ^^(1 — po)- 
Similarly, for i > 1, 7r* +1 = 7T^(1 — pt) = w\ fj+vp ' from which it follows by induction that irj = 71"^ (1 — Po)/i 2 , for 
all i > 1. By symmetry njj = 7r l 1 /2 l for each i e N and 1 < j < 2*. Therefore, for each i G N, 1 < j < 2 % we have 
n ij = n oi(l ~ Po)/(i 2 ■ 2 4 ) = C/(i 2 ■ 2 l ) as was claimed. Moreover, nfj — njj ■ pi — irjj ■ u+rj* • Combining with the 
expression for Trjj gives nfj = ' (i+i) 2 ■ By induction, ^ = 71"?- = ... = 7Ty, so this completes the proof. □ 

Note that for alH > 1 and 1 < j < 2*: 

C / s C 2i + 1 C 

7r ^2^ + ( ^ 1) 2^'(7TIF-^' and (19) 

C , , C 2i + l 3C 

'7^2 <^"2 ■ (20) 



tJ 2^ - i 2 v ; 2<-i 2 (z + 1) 2 " 2 l -i 2 
Also note that for any t G N and i > 2t we have for each 1 < j < 2 J : 



1. P(J?« G W t \S Q = 0%) = 1, for 2 < k < \i/2] + 1. 

2- (EU 4) ^ V3 and |{fc : 2 < fe < fi/2] + 1}| > \-\{k : 2 < ft < t}|. Hence, (£^ 1+1 tt§) /t^ > 1/6. 
Therefore, for each t G N: 

P(^* G W t \S G %) > 1/6 , for all i > It and 1 < j < 2 l . (21) 
Equations (JToJ) , (20), and (21) will be used in the proof of Claim [3] below, along with the following simple lemma. 

Lemma 1 (Integral Test). Let n G N and let f : [n, 00] — > R 6e a positive, continuous, monotone- decreasing function, 
then: 



/>oo x />C 

/ /(x)<te < ]T /(*) < f(n) + \ 

Jn k=n Jn 



f(x)dx 



Claim 3. P(W t ) decays roughly as l/t. More exactly, C/12f < P{W t ) < 6C/t for all t G N. 
Proof. For any state with % < t, P(a* G Wt|So = of-) = 0. Thus, we have: 

P(Wt) = P(^ 4 G W t ) 

OO 2 l 

= ^^P(5o G i2y) • P$* G W t |So € J2y) 



i=t 2=1 

OC 



£] 2 l • ¥(S a G -Rii) • P(^ ( G Wi|£ G 



(22) 



where the second equality follows from symmetry. We prove the bounds from above and below on P(Wt) separately 
using Eq. f22l. 



Bound from below. 



00 

00 

> Y 21 ■ p ( s ° e ^1) • p (^' e Wo e Rii) 



i=2t 

C«0 ^ ol c 1 

> > 2 l • - — - • - 
- L^, 2 * ■ i 2 6 

i=2t 

= C f I 

i=2t 

(6) C roo 1 

6 y 2 * x2 

~ 12t ' 



(23) 



Here, (a) follows from Eqs. (19) and (21) and (b) from Lemma [T] 
• Bound from above: 
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oo 

V(W,) =J2 2 '- p ( s » e R ") ■ p (^' e H '"l s » e Ra) 



- 2* ■ i 2 



oo 



i=t 



(6) /J i-oo ! 

< 3(7 U + / -^dx 



3C 



t 2 y t x 2 
i r 



Here, (a) follows from Eq. (20) and (b) from Lemma [I] 

□ 

Claim 4. P(X t G {2,3}^* = w) > 1/150, for all ten and w G W t . 
Proof. Applying Claim [3] we have for any t E N: 

P(X t e {2,3}|^* e W t ) = F(jf t+1 e W t +i|^* e W t ) 

= P(A** +1 G Wf+i,^' G VT t )/P(A^ G W t ) 

= P(A** +1 G W t+ i)/P(J?* G W t ) 
C/12(t + l) 



> 



6C/t 
1 i 



72 i + 1 
1 

~ 150 ' 

By symmetry, P(A 4 G {2,3}|A^ 4 = w) is the same for each w G Wj. Thus, the same bound must also hold for each 
weW t individually: P(X t G {2,3}|A** = w) > 1/150 for all w G W t . □ 

Claim 5. For each t G N and w G Wt, (%) < 1/300 and (mJ h w > 1/150. Hence, h w -h w > 1/300. 

Proo/ o/ (%). /i^fc = 0, for alH > 1, 1 < j < 2', and k > 2. And, for each w G W t , <j){w)^ = 0, for all i > 1 
and 1 < j < 2\ Hence, for each w G Wt, = X^es 0(w) cr ft. c7 = (^(w)^^^. By construction of the machine 
/lo-Ji — 1/300 and, clearly, (^(w)^ can never exceed 1. Thus, h w < 1/300 for all w G VFj. □ 

Proof of (ii). Let the random variable Z t be defined by: Z t = 1 if X t G {2,3} and Z t = if X t {2,3}. By Claim 
[4J V(Z t = 1|X* = «;) > 1/150 for any w G and, by symmetry, the probabilities of a 2 or a 3 following any word 
weW, are equal, so ¥(X t = 2| A^' = 10, Z t = 1) = P(A t = 3|A** = to, Z t = 1) = 1/2. Therefore, for any w G W t : 



>ff[A 4 |A^ =w,Z t ] 

> ¥(Z t = 10 = w) ■ H[X t \^* =w,Z t = 1] 

> 1/150-1 . 



□ 



Claim 6. The quantity h^t) — h^ decays at a rate no faster than 1/t. More exactly, h^it + 1) — > 36 Q 0t , for all 
t G N. ■ : . , ., 
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Proof. As noted above, since the machine satisfies the conditions of Prop. [I] the entropy rate is given by Eq. ^ and 
the difference h^{t + 1) — is given by Eq. (14 1. Therefore, applying Claims [3] and [5] we may bound the quantity 
h^ L {t+ 1) — as follows: 

h^t + l)-h^= p H(^ - M 

w£C t 

wew t 
> P(Wi) ' 



> 



300 

C 



3600t 

□ 

With the above decay on established we easily see the Branching Copy Process must have infinite excess 

entropy. 

Proposition 3. The excess entropy E for the BC Process is infinite. 

Proof. E = Et^i ~ V)- B;y Claim[6j this sum must diverge. □ 



IV. CONCLUSION 



Any stationary, finite-alphabet process may be represented as an invariant HMM with an uncountable state set. 
Thus, there exist invariant HMMs with uncountable state sets capable of generating infinitary processes over finite 
alphabets. It is impossible, however, to have a finite-state invariant HMM that generates an infinitary process. The 
excess entropy E is always bounded by the entropy in the stationary distribution -ff[7r], which is finite for any finite- 
state HMM. Countable-state HMMs are intermediate between the finite and uncountable cases, and it was previously 
unknown whether infinite excess entropy was possible in this case. We have demonstrated that it is indeed possible, by 
giving two explicit constructions of finite-alphabet infinitary processes generated by invariant HMMs with countable 
state sets. 

The second example, the Branching Copy Process, is also ergodic — a strong restriction. It is a priori quite plausible 
that infinite E might only occur in the countable-state case for nonergodic processes. Moreover, both HMMs we 
constructed are unifilar, so the e-machines [13] of the processes have countable state sets as well. Again, unifilarity 
is a strong restriction to impose, and it is a priori conceivable that infinite E might only occur in the countable-state 
case for nonunifilar HMMs. Our examples have shown, though, that infinite E is possible for countable-state HMMs, 
even if one requires both ergodicity and unifilarity. 



Appendix A 

We prove Prop. (TJfrom Sec. |IIB[ which states that the entropy rate of any countable-state, exact, unifilar HMM 
is given by the standard formula: 

h ll =H[X \S ] = Y J ^ha ■ (Al) 

Proof. Let £ t — {w : \w\ = t,P(w) > 0} be the set of length t words in the process language £, and let 4>{w) be 
the conditional state distribution induced by a word w G C: i.e., 4>(w) a = P(St = cr|^ f = w). Furthermore, let 
h w = 4>{ w )<yh a be the </)(u>)-weighted average entropy in the next symbol given knowledge of the current state a. 
And let h w = i/[A t |A^* = w] = H[Xq\Sq ~ (f>(w)] be the entropy in the next symbol given the word w. Note that: 

1. h^t + 1) = HiXtlx't] = J2 weCt V(w)h w , and 

2- Ea **h a = (J2wec t K = E we c t W (E, <K*)*K) - E we c t ^HK ■ 
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Since we know h^it) limits to h^, it suffices to show that: 



lim V P{w)(h w -h w )=Q 

f, — >oo — ' 



(A2) 



By concavity of the entropy function, h w — h w > for any in. However, for a synchronizing word w = w\...w t 
with iJ[5 t |A* = w] = 0, /i w — h w is always 0, since the distribution <j>(w) is concentrated only on a single state. 
Furthermore, for any w, h w — h w < h w < log \X\. Thus: 



53 P(w)(/ lw - M < log |Af| • P(JV5 t ) , 



(A3) 



where NS t is the set of lengths words that are nonsynchronizing and P(NS t ) is the combined probability of all words 
in this set. Since the HMM is exact, we know that for a.e. infinite future x an observer will synchronize exactly at 
some finite time t = t(x). And, since it is unifilar, the observer will remain synchronized for all t' > t. It follows that 
P(ATiSt) must be monotonically decreasing and limit to 0: 



lim F(NS t ) = 

t— too 



Combining Eq. (A3) with Eq. (A4) shows that Eq. (A2) does in fact hold, which completes the proof. 



(A4) 

□ 



Appendix B 

We prove the following proposition for the entropy rate of countable-state HMMs. 

Proposition 4. Let M be a countable-state HMM and let V = {X t ) be the process generated by M. If V does not 
consist entirely of periodic sequences, then its entropy rate V is strictly positive. 

Proof. For any countable-state HMM M, the future output sequence and past output sequence are conditionally 
independent given the current state. Thus, for all t G N, H [X t \ a', St] = H[X t \St]. Also, by stationarity -ff[X t |S' t ] = 
iJ[-Xo|So] = Sct^^ct, for all t. Combining these facts shows that entropy rate is always bounded below by the 
standard unifilar formula of Eq. Q : 

hp = lim H[X t \^ t ] 

t— >oo 

> lim HlXtlSt,^ 1 ] 
= lim H[X t \S t ] 

t—^-OO 

= n ° h ° ■ ( B1 ) 

Therefore, the entropy rate is positive if h a > for any state a with nonzero probability 7r CT or, equivalcntly, if there 
are at least two outgoing edges in the associated graph from state a. 

Now, assume there is no such state. Consider the restricted state set S consisting of states a with positive probability 
(~K a > 0) and the restricted graph G associated to this state set. Clearly, the HMM M defined by this graph with 
stationary distribution it generates the same process V as the original HMM. And, it is also easily seen that in order 
to keep the distribution 7r stationary, the graph G must consist entirely of disjoint strongly connected components. 
That is, each connected component of G must be strongly connected. Take any strongly connected component Cj in 
G. Since each state a in Cj has only a single outgoing edge and Cj is strongly connected, it follows that Ci must 
be a deterministic loop of some finite length li. Since this holds for each strongly connected component Ci in G and 
the HMM M is always run from one of the CjS, it follows that all sequences — ...X-iXqXi... generated by M are 
periodic. Or, equivalently, all sequences generated by M are periodic. □ 
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